Search CORE

79 research outputs found

The Case for Learned Index Structures

Author: Abadi M.
Armbrust M.
Böhm M.
Chang F.
Goodfellow I.
Grossi R.
Lehman T. J.
Litwin W.
Magdon-Ismail M.
Miller D. J.
Moerkotte G.
Sutskever I.
You S.
Publication venue
Publication date: 30/04/2018
Field of study

Indexes are models: a B-Tree-Index can be seen as a model to map a key to the position of a record within a sorted array, a Hash-Index as a model to map a key to a position of a record within an unsorted array, and a BitMap-Index as a model to indicate if a data record exists or not. In this exploratory research paper, we start from this premise and posit that all existing index structures can be replaced with other types of models, including deep-learning models, which we term learned indexes. The key idea is that a model can learn the sort order or structure of lookup keys and use this signal to effectively predict the position or existence of records. We theoretically analyze under which conditions learned indexes outperform traditional index structures and describe the main challenges in designing learned index structures. Our initial results show, that by using neural nets we are able to outperform cache-optimized B-Trees by up to 70% in speed while saving an order-of-magnitude in memory over several real-world data sets. More importantly though, we believe that the idea of replacing core components of a data management system through learned models has far reaching implications for future systems designs and that this work just provides a glimpse of what might be possible

arXiv.org e-Print Archive

Crossref

Introduction to the special issue on neural networks in financial engineering

Author: A.F. Atiya
H. White
M. Magdon-Ismail
Y.S. Abu-Mostafa
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

THE WAIT-AND-SEE OPTION IN ASCENDING PRICE AUCTIONS

Author: A. Chauduri
A. Chauduri
A. Feldman
A. Feldman
A. Fink
A. Tasnádi
E. Akin
G. Woeginger
H. Kuhn
H. Steinhaus
H. Steinhaus
H. Steinhaus
J. Robertson
J. Robertson
M. Magdon-Ismail
S. Brams
S. Brams
S. Brams
S. Even
W. Stromquist
Publication venue: WILEY-BLACKWELL
Publication date: 01/01/2004
Field of study

Cake-cutting protocols aim at dividing a ``cake'' (i.e., a divisible resource) and assigning the resulting portions to several players in a way that each of the players feels to have received a ``fair'' amount of the cake. An important notion of fairness is envy-freeness: No player wishes to switch the portion of the cake received with another player's portion. Despite intense efforts in the past, it is still an open question whether there is a \emph{finite bounded} envy-free cake-cutting protocol for an arbitrary number of players, and even for four players. We introduce the notion of degree of guaranteed envy-freeness (DGEF) as a measure of how good a cake-cutting protocol can approximate the ideal of envy-freeness while keeping the protocol finite bounded (trading being disregarded). We propose a new finite bounded proportional protocol for any number n \geq 3 of players, and show that this protocol has a DGEF of 1 + \lceil (n^2)/2 \rceil. This is the currently best DGEF among known finite bounded cake-cutting protocols for an arbitrary number of players. We will make the case that improving the DGEF even further is a tough challenge, and determine, for comparison, the DGEF of selected known finite bounded cake-cutting protocols.Comment: 37 pages, 4 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

UCL Discovery

The University of Manchester - Institutional Repository

Extending the definition of modularity to directed graphs with overlapping communities

Author: Arenas A
Baumes J Goldberg M Magdon-Ismail M
Brede M Sinha S
Danon L
Fortunato S
G Mangioni
Holme P
Lancichinetti A
M Malgeri
Newman M E J
Palla G
Tasgin M Herdagdelen A Bingol H
V Carchiolo
V Nicosia
Publication venue: 'IOP Publishing'
Publication date: 24/03/2009
Field of study

Complex networks topologies present interesting and surprising properties, such as community structures, which can be exploited to optimize communication, to find new efficient and context-aware routing algorithms or simply to understand the dynamics and meaning of relationships among nodes. Complex networks are gaining more and more importance as a reference model and are a powerful interpretation tool for many different kinds of natural, biological and social networks, where directed relationships and contextual belonging of nodes to many different communities is a matter of fact. This paper starts from the definition of modularity function, given by M. Newman to evaluate the goodness of network community decompositions, and extends it to the more general case of directed graphs with overlapping community structures. Interesting properties of the proposed extension are discussed, a method for finding overlapping communities is proposed and results of its application to benchmark case-studies are reported. We also propose a new dataset which could be used as a reference benchmark for overlapping community structures identification.Comment: 22 pages, 11 figure

arXiv.org e-Print Archive

Crossref

Quantifying and identifying the overlapping community structure in networks

Author: Arenas A
Baumes J Goldberg M K Krishnamoorthy M S Magdon-Ismail M Preston N
Blondel V D
Cheng X Q
Farkas I J
Flake G W
Hua-Wei Shen
Jia-Feng Guo
Lancichinetti A
Lancichinetti A Fortunato S
Nelson D L McEvoy C L Schreiber T A
Nicosia V
Palla G
Xue-Qi Cheng
Zachary W W
Publication venue: 'IOP Publishing'
Publication date: 23/07/2009
Field of study

It has been shown that the communities of complex networks often overlap with each other. However, there is no effective method to quantify the overlapping community structure. In this paper, we propose a metric to address this problem. Instead of assuming that one node can only belong to one community, our metric assumes that a maximal clique only belongs to one community. In this way, the overlaps between communities are allowed. To identify the overlapping community structure, we construct a maximal clique network from the original network, and prove that the optimization of our metric on the original network is equivalent to the optimization of Newman's modularity on the maximal clique network. Thus the overlapping community structure can be identified through partitioning the maximal clique network using any modularity optimization method. The effectiveness of our metric is demonstrated by extensive tests on both the artificial networks and the real world networks with known community structure. The application to the word association network also reproduces excellent results.Comment: 9 pages, 7 figure

arXiv.org e-Print Archive

Crossref

Overlapping Community Discovery Methods: A Survey

Author: A Lancichinetti
A Lancichinetti
A Lancichinetti
A Lancichinetti
AJ Enright
B Adamcsek
Baumes J Goldberg M, Magdon-Ismail M (2005) Efficient identification of overlapping communities. In: Proceedings of the
BS Rees
C Lee
DE Goldberg
F Wei
G Palla
G Palla
H Shen
J Baumes
J Chen
J Xie
JB Pereira
M Coscia
M Girvan
MEJ Newman
R Cazabet
S Fortunato
S Gregory
S Gregory
S Zhang
TS Evans
UN Raghavan
Y-Y Ahn
Z-H Wu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/11/2014
Field of study

The detection of overlapping communities is a challenging problem which is gaining increasing interest in recent years because of the natural attitude of individuals, observed in real-world networks, to participate in multiple groups at the same time. This review gives a description of the main proposals in the field. Besides the methods designed for static networks, some new approaches that deal with the detection of overlapping communities in networks that change over time, are described. Methods are classified with respect to the underlying principles guiding them to obtain a network division in groups sharing part of their nodes. For each of them we also report, when available, computational complexity and web site address from which it is possible to download the software implementing the method.Comment: 20 pages, Book Chapter, appears as Social networks: Analysis and Case Studies, A. Gunduz-Oguducu and A. S. Etaner-Uyar eds, Lecture Notes in Social Networks, pp. 105-125, Springer,201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Stratification of the severity of critically ill patients with classification trees

Author: A Abu-Hanna
A Rovlias
Angel Rodriguez-Pozo
AP Webster
BC Pang
BP Zhu
CH Köhne
DW Hosmer
E Tom
G Dolce
H Ian
J Gaudart
J Rapoport
J Trujillano
J Trujillano
JA Hanley
Jaume March
Javier Trujillano
JE Zimmerman
JJ Mann
JM Bland
JR Le Gall
LB Gerald
LG Gortzis
Luis Serviá
M Magdon-Ismail
Mariona Badia
MC Costanza
MR van Dijk
NJ Crichton
O Takahashi
PC Austin
PR Harper
R Muller
R Wolfe
RP Peters
S Lemeshow
S Lemeshow
S Lemeshow
SE de Rooij
V Podgorelec
WA Knaus
WA Knaus
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Development of three classification trees (CT) based on the CART (<it>Classification and Regression Trees</it>), CHAID (<it>Chi-Square Automatic Interaction Detection</it>) and C4.5 methodologies for the calculation of probability of hospital mortality; the comparison of the results with the APACHE II, SAPS II and MPM II-24 scores, and with a model based on multiple logistic regression (LR). Methods Retrospective study of 2864 patients. Random partition (70:30) into a Development Set (DS) n = 1808 and Validation Set (VS) n = 808. Their properties of discrimination are compared with the ROC curve (AUC CI 95%), Percent of correct classification (PCC CI 95%); and the calibration with the Calibration Curve and the Standardized Mortality Ratio (SMR CI 95%). Results CTs are produced with a different selection of variables and decision rules: CART (5 variables and 8 decision rules), CHAID (7 variables and 15 rules) and C4.5 (6 variables and 10 rules). The common variables were: inotropic therapy, Glasgow, age, (A-a)O2 gradient and antecedent of chronic illness. In VS: all the models achieved acceptable discrimination with AUC above 0.7. CT: CART (0.75(0.71-0.81)), CHAID (0.76(0.72-0.79)) and C4.5 (0.76(0.73-0.80)). PCC: CART (72(69-75)), CHAID (72(69-75)) and C4.5 (76(73-79)). Calibration (SMR) better in the CT: CART (1.04(0.95-1.31)), CHAID (1.06(0.97-1.15) and C4.5 (1.08(0.98-1.16)). Conclusion With different methodologies of CTs, trees are generated with different selection of variables and decision rules. The CTs are easy to interpret, and they stratify the risk of hospital mortality. The CTs should be taken into account for the classification of the prognosis of critically ill patients.</p

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Repositori Obert UdL

Financial time series prediction using spiking neural networks

Author: A Abraham
A Ganatr
Abir Jaafar Hussain
AJ Hussain
C Johnson
CL Dunis
CL Dunis
CL Giles
D Horvatic
David Reid
DY Kenett
DY Kenett
EA Plummer
EM Izhikevich
EM Izhikevich
EM Izhikevich
Eshel Ben-Jacob
F Allen
G Czanner
G DeCo
H Jiang
H White
Hissam Tawfik
HM Feng
I Kaastra
J Conlick
J Yao
JA Wall
JD Victor
JW Kantelhardta
K Boer
LJ Cao
M Magdon-Ismail
MR Thomason
R Araujo
R Ghazali
R Ghazali
R Ghazali
R Legenstein
R Schwaerzel
R Sitte
RJ Hyndman
S Haykin
S Lawrence
T Natschläger
TY Kim
UT Eden
V Sharma
W Gerstner
YS Abu-Mostafa
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

In this paper a novel application of a particular type of spiking neural network, a Polychronous Spiking Network, was used for financial time series prediction. It is argued that the inherent temporal capabilities of this type of network are suited to non-stationary data such as this. The performance of the spiking neural network was benchmarked against three systems: two "traditional", rate-encoded, neural networks; a Multi-Layer Perceptron neural network and a Dynamic Ridge Polynomial neural network, and a standard Linear Predictor Coefficients model. For this comparison three non-stationary and noisy time series were used: IBM stock data; US/Euro exchange rate data, and the price of Brent crude oil. The experiments demonstrated favourable prediction results for the Spiking Neural Network in terms of Annualised Return and prediction error for 5-Step ahead predictions. These results were also supported by other relevant metrics such as Maximum Drawdown and Signal-To-Noise ratio. This work demonstrated the applicability of the Polychronous Spiking Network to financial data forecasting and this in turn indicates the potential of using such networks over traditional systems in difficult to manage non-stationary environments. © 2014 Reid et al

LJMU Research Online (Liverpool John Moores University)

Public Library of Science (PLOS)

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

Hope's Institutional Research Archive

A Survey of Bayesian Statistical Approaches for Big Data

Author: A Akusok
A Baldominos
A Belle
A Beskos
A Bouchard-Côté
A De Mauro
A Fahad
A Gandomi
A Lee
A Lee
A Marshall
A O’Driscoll
A Siddiqa
A Vyas
AB Owen
AF Wise
AR Linero
AT Azar
AT Porter
AT Porter
AÇ Pehlivanlı
B Franke
B Liquet
B Liu
B Oancea
C Loebbecke
C Wang
C Wang
C Yang
CA McGrory
CC Drovandi
CE Rasmussen
Changwon Yoo
CK Emani
D Apiletti
D Oprea
D Talia
DB Dunson
DM Blei
DN Politis
DT Frazier
DV Shah
DW Bates
E Raguseo
ED Schifano
ET Bradlow
F Lindsten
Florian Buettner
Florian Maire
G Bello-Orgaz
G Jifa
GI Allen
GJ Lasinio
GM Allenby
H Cai
H Demirkan
H Hassani
H Kousar
HA Chipman
HH Huang
HJ Watson
I Ben-Gal
J Fan
J Roski
J Zhu
Jake Luo
JE Bibault
JJ Chen
JN Cappella
JS Rumsfeld
K Chalupka
Kath Albury
KL Mengersen
KS Divya
L Breiman
L Liu
L Mählmann
L Wang
L Yu
L Zhang
L Zhou
LG Nongxa
M Hilbert
M Viceconti
MA Suchard
Matias Quiroz
MD Assunção
MD Hoffman
MT Moores
N Moustafa
N. Chopin
NA Lazar
O Sysoev
Oliver Müller
OY Al-Jarrah
P Ducange
P Müller
P Pudlo
PF Brennan
R Bardenet
R Burrows
R Guhaniyogi
R Guhaniyogi
R Guhaniyogi
R Izbicki
RF Mansour
Richard Branch
Robin Genuer
RW Hoerl
S Atkinson
S Castruccio
S Chaudhuri
S Fosso Wamba
S Guha
S Kaisler
S Li
S Minsker
S Pandey
S Sagiroglu
S Sisson
S Srivastava
S Suthaharan
S White
SF Wamba
Shahriar Akter
Shweta Bansal
Simon I. Hay
SL Scott
SM Schennach
Sudipto Banerjee
T Magdon-Ismail
T Zhang
Tengyao Wang
TH McCormick
TJ McKinley
U Sivarajah
VD Katkar
X Zhang
XF Wang
XG Xia
Xing Ju Lee
Y Tang
Y Webb-Vargas
Y Zhang
Yang Ni
YW Teh
Z Ma
Z Sun
Z Zhang
Ziad Obermeyer
Zoubin Ghahramani
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/05/2020
Field of study

The modern era is characterised as an era of information or Big Data. This has motivated a huge literature on new methods for extracting information and insights from these data. A natural question is how these approaches differ from those that were available prior to the advent of Big Data. We present a review of published studies that present Bayesian statistical approaches specifically for Big Data and discuss the reported and perceived benefits of these approaches. We conclude by addressing the question of whether focusing only on improving computational algorithms and infrastructure will be enough to face the challenges of Big Data

arXiv.org e-Print Archive

Crossref

Queensland University of Technology ePrints Archive